Designing spelling correctors for inflected languages using lexical transducers

نویسندگان

  • Izaskun Aldezabal
  • Iñaki Alegria
  • Olatz Ansa
  • Jose Maria Arriola
  • Nerea Ezeiza
چکیده

This paper describes the components used in the design of the commercial X u x e n I I spelling checker/corrector for Basque. It is a new version of the Xuxen spelling corrector (Aduriz et al., 97) which uses lexical transducers to improve the process. A very important new feature is the use of user dictionaries whose entries can recognise both the original and inflected forms. In languages with a high level of inflection such as Basque spelling checking cannot be resolved without adequate treatment of words from a morphological standpoint. In addition to this, the morphological treatment has other important features: coverage, reusability of tools, orthogonality and security. The tool is based in lexical transducers and is built using the fst library of Inxight 1. A lexical transducer (Karttunen, 94) is a finite-state automaton that maps inflected surface forms to lexical forms, and can be seen as an evolution of twolevel morphology (Koskenniemi, 83) where the use of diacritics and homographs can be avoided and the intersection and composition of transducers is possible. In addition, the process is very fast and the transducer for the whole morphological description can be compacted in less than 1Mbyte. The design of the spelling corrector consists of four main modules:

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

A Two-level Morphological Analyser and Generator for Irish using Finite-State Transducers

Computational morphology is an important part of natural language processing. Finite-state techniques have been applied successfully in computational phonology and morphology to many of the world’s major languages. Celtic languages such as Modern Irish present challenging morphological features that to date have not been addressed using finite-state technology. This paper presents a finite-stat...

متن کامل

Lexical Analysis of Agglutinative Languages Using a Dictionary of Lemmas and Lexical Transducers

This paper presents a simple method for performing a lexical analysis of agglutinative languages like Korean, which have a heavy morphology. Especially, for nouns and adverbs with regular morphological modifications and/or high productivity, we do not need to artificially construct huge dictionaries of all inflected forms of lemmas. To construct a dictionary of lemmas and lexical transducers, f...

متن کامل

Using foma for language-based games

This paper describes two examples of how finite-state technology (FST) commonly used in computational morphology can help implement language-based games. The tool we have used is foma an open-source toolkit, similar to previous Xerox/PARC finite-state tools. FST tools have been widely used to describe the morphology of languages and to implement spelling checkers and correctors, especially for ...

متن کامل

From Lexical Acquisition to Lexical Reusable Tools

Having as background the work in the definition and implementation of a system for the acquisition and management of reusable morphological and phrasal dictionaries, and the realization of a framework for the generation of different finite-state tools for an efficient and distributed use of the different functionalities defined in the system, we will present the overall system and focus the att...

متن کامل

Design and implementation of Persian spelling detection and correction system based on Semantic

Persian Language has a special feature (grapheme, homophone, and multi-shape clinging characters) in electronic devices. Furthermore, design and implementation of NLP tools for Persian are more challenging than other languages (e.g. English or German). Spelling tools are used widely for editing user texts like emails and text in editors.  Also developing Persian tools will provide Persian progr...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 1999